Cache-oblivious wavefront algorithms for dynamic programming problems: efficient scheduling with optimal cache performance and high parallelism

نویسندگان

  • Jesmin Jahan Tithi
  • Pramod Ganapathi
  • Rezaul Chowdhury
  • Yuan Tang
چکیده

Wavefront algorithms are algorithms on grids where execution proceeds in a wavefront manner from the start to the end of the execution (execution moves through the grid as if a wavefront is moving). Many dynamic programming problems and stencil computations are wavefront algorithms. Iterative wavefront algorithms for evaluating dynamic programming (DP) recurrences exploit optimal parallelism, but show poor cache performance (PPoPP2015). Tiled-iterative wavefront algorithms achieve optimal cache performance and high parallelism, but are cache-aware, and hence are neither portable, nor cache-adaptive (i.e., does not adapt to dynamic fluctuations in cache space) (PPoPP2016). In contrast, standard cache-oblivious recursive divide-and-conquer (CORDAC) algorithms have optimal serial cache complexity, but often have low parallelism due to artificial dependencies among the subtasks (PPoPP2015). The cache-oblivious recursive wavefront algorithms for DP problems are variants of CORDAC algorithms with reduced or no artificial dependency among subtasks. As a result cache-oblivious recursive wavefront algorithms often have asymptotically better parallelism than the corresponding CORDAC algorithms. In this research, we show how to systematically transform a standard CORDAC algorithm into a recursive wavefront algorithm for a DP problem to achieve optimal parallel cache performance and high parallelism under the state-of-theart schedulers for fork-join programs. These cache-oblivious wavefront algorithms use closed-form formulas to compute at what execution timestep each task must be launched in order to achieve high parallelism without losing cache performance. We present experimental performance and scalability results showing the superiority of these new algorithms over the existing ones for some popular DP problems on recent multicore and manycores architectures.

منابع مشابه

Yuan Tang Research Statement Motivation and Overview

My research interest is the performance engineering, an interdisciplinary research area at the intersection of parallel and concurrent computing, programming languages, systems and theory. The goal of my research is to obtain scalable and portable performance for real-world applications and algorithms on modern computing systems by exploiting caching and parallelism. The applications and algori...

متن کامل

Brief Announcement : The Cache - Oblivious Gaussian Elimination Paradigm — Theoretical Framework and Experimental Evaluation ∗

Cache-efficient algorithms improve execution time by exploiting data parallelism inherent in the transfer of blocks of useful data between adjacent memory levels. By increasing locality in their memory access patterns, these algorithms try to keep the number of block transfers small. The cache-oblivious model [1] is a further refinement that enables the development of system-independent cache-e...

متن کامل

Portable high-performance programs

This dissertation discusses how to write computer programs that attain both high performance and portability, despite the fact that current computer systems have different degrees of parallelism, deep memory hierarchies, and diverse processor architectures. To cope with parallelism portably in high-performance programs, we present the Cilk multithreaded programming system. In the Cilk-5 system,...

متن کامل

Funnel Heap - A Cache Oblivious Priority Queue

The cache oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory model. Arge et al. recently presented the first optimal cache oblivious priority queue, and demonstr...

متن کامل

Dynamic programming in faulty memory hierarchies (cache-obliviously)

Random access memories suffer from transient errors that lead the logical state of some bits to be read differently from how they were last written. Due to technological constraints, caches in the memory hierarchy of modern computer platforms appear to be particularly prone to bit flips. Since algorithms implicitly assume data to be stored in reliable memories, they might easily exhibit unpredi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016